home *** CD-ROM | disk | FTP | other *** search
-
-
-
-
-
-
- Network Working Group M. Schwartz
- Request for Comments: 1273 University of Colorado
- November 1991
-
-
- A Measurement Study of Changes in
- Service-Level Reachability in the Global
- TCP/IP Internet: Goals, Experimental Design,
- Implementation, and Policy Considerations
-
- Status of this Memo
-
- This memo provides information for the Internet community. It does
- not specify an Internet standard. Distribution of this memo is
- unlimited.
-
- Abstract
-
- In this report we discuss plans to carry out a longitudinal
- measurement study of changes in service-level reachability in the
- global TCP/IP Internet. We overview our experimental design,
- considerations of network and remote site load, mechanisms used to
- control the measurement collection process, and network appropriate
- use and privacy issues, including our efforts to inform sites
- measured by this study. A list of references and information on how
- to contact the Principal Investigator are included.
-
- Introduction
-
- The global TCP/IP Internet interconnects millions of individuals at
- thousands of institutions worldwide, offering the potential for
- significant collaboration through network services and electronic
- information exchange. At the same time, such powerful connectivity
- offers many avenues for security violations, as evidenced by a number
- of well publicized events over the past few years. In response, many
- sites have imposed mechanisms to limit their exposure to security
- intrusions, ranging from disabling certain inter-site services, to
- using external gateways that only allow electronic mail delivery, to
- gateways that limit remote interactions via access control lists, to
- disconnection from the Internet. While these measures are preferable
- to the damage that could occur from security violations, taken to an
- extreme they could eventually reduce the Internet to little more than
- a means of supporting certain pre-approved point-to-point data
- transfers. Such diminished functionality could hinder or prevent the
- deployment of important new types of network services, impeding both
- research and commercial advancement.
-
- To understand the evolution of this situation, we have designed a
-
-
-
- Schwartz [Page 1]
-
- RFC 1273 A Measurement Study November 1991
-
-
- study to measure changes in Internet service-level reachability over
- a period of one year. The study considers upper layer service
- reachability instead of basic IP connectivity because the former
- indicates the willingness of organizations to participate in inter-
- organizational computing, which will be an important component of
- future wide area distributed applications.
-
- The data we gather will contribute to Internet research and
- engineering planning activities in a number of ways. The data will
- indicate the mechanisms sites use to distance themselves from
- Internet connectivity, the types of services that sites are willing
- to run (and hence the type of distributed collaboration they are
- willing to support), and variations in these characteristics as a
- function of geographic location and type of institution (commercial,
- educational, etc.). Understanding these trends will allow
- application designers and network builders to more realistically plan
- for how to support future wide area distributed applications such as
- digital library systems, information services, wide area distributed
- file systems, and conferencing and other collaboration-support
- systems. The measurements will also be of general interest, as they
- represent direct measurements of the evolution of a global electronic
- society.
-
- Clearly, a study of this nature and magnitude raises a number of
- potential concerns. In this note we overview our experimental
- design, considerations of network and remote site load, mechanisms
- used to control the measurement collection process, and our efforts
- to inform sites measured by this study, along with concomitant
- network appropriate use and privacy issues.
-
- A point we wish to stress from the outset is that this is not a study
- of network security. The experiments do not attempt to probe the
- security mechanisms of any machine on the network. The study is
- concerned solely with the evolution of network connectivity and
- service reachability.
-
- Experimental Design
-
- The study consists of a set of runs of a program over the span of one
- to two days each month, repeated bimonthly for a period of one year
- (in January 1992, March 1992, May 1992, July 1992, September 1992,
- and November 1992). Each program run attempts to connect to 13
- different TCP services at each of approximately 12,700 Internet
- domains worldwide, recording the failure/success status of each
- attempt. The program will attempt no data transfers in either
- direction. If a connection is successful, it is simply closed and
- counted. (Note in particular that this means that the security
- mechanism behind individual network services will not be tested.)
-
-
-
- Schwartz [Page 2]
-
- RFC 1273 A Measurement Study November 1991
-
-
- The machines on which connections are attempted will be selected at
- random from a large list of machines in the Internet, constrained
- such that at most 1 to 3 machines is contacted in any particular
- domain.
-
- The services to which connections will be attempted are:
-
- __________________________________________________________________
- Port Number Service Port Number Service
- ------------------------------------------------------------------
- 13 daytime 111 Sun portmap
- 15 netstat 513 rlogin
- 21 FTP 514 rsh
- 23 telnet 540 UUCP
- 25 SMTP 543 klogin
- 53 Domain Naming System 544 krcmd, kshell
- 79 finger
- _________________________________________________________________
-
- This list was chosen to span a representative range of service
- types, each of which can be expected to be found on any machine in a
- site (so that probing random machines is meaningful). The one
- exception is the Domain Naming System, for which the machines
- to probe are selected from information obtained from the Domain
- system itself. Only TCP services are tested, since the TCP
- connection mechanism allows one to determine if a server is
- running in an application-independent fashion.
-
- As an aside, it would be possible to retrieve "Well Known
- Service" records from the Domain Naming System, as a somewhat less
- "invasive" measurement approach. However, these records are not
- required for proper network operation, and hence are far from
- complete or consistent in the Domain Naming System. The only way
- to collect the data we want is to measure them in the fashion
- described above.
-
- Network and Remote Site Load
-
- The measurement software is quite careful to avoid generating
- unnecessary internet packets, and to avoid congesting the internet
- with too much concurrent activity. Once it has successfully
- connected to a particular service in a domain, the software never
- attempts to connect to that service on any machine in that domain
- again, for the duration of the current measurement run (i.e., the
- current 60 days). Once it has recorded 3 connection refusals at any
- machines in that domain for a service, it does not try that service
- at that domain again during the current measurement run. If it
- experiences 3 timeouts on any machine in a domain, it gives up on the
-
-
-
- Schwartz [Page 3]
-
- RFC 1273 A Measurement Study November 1991
-
-
- domain, possibly to be retried again a day later (to overcome
- transient network problems). In the worst case there will be 3
- connection failures for each service at 3 different machines, which
- amounts to 37 connection requests per domain (3 for each of the 12
- services other than the Domain Naming System, and one for the Domain
- Naming System). However, the average will be much less than this.
-
- To quantify the actual Internet load, we now present some
- measurements from test runs of the measurement software that were
- performed in August 1991. In total, 50,549 Domain Naming System
- lookups were performed, and 73,760 connections were attempted. This
- measurement run completed in approximately 10 hours, never initiating
- more than 20 network operations (name lookups or connection attempts)
- concurrently. The total NSFNET backbone load from all traffic
- sources that month was approximately 5 billion packets. Therefore,
- the traffic from our measurement study amounted to less than .5% of
- this volume on the day that the measurements were collected. Since
- the Internet contains several other backbones besides NSFNET, the
- proportionate increase in total Internet traffic was significantly
- less than .5%.
-
- The cost to a remote site being measured is effectively zero. From
- the above measurements, on average we attempted 5.7 connections per
- remote domain. The cost of a connection open/close sequence is quite
- small, particularly when compared to the cost of the many electronic
- mail and news transmissions that most sites experience on a given
- day.
-
- Control Over Measurement Collection Process
-
- The measurement software evolved from an earlier set of experiments
- used to measure the reach of an experimental Internet white pages
- tool called netfind [Schwartz & Tsirigotis 1991b], and has been
- evolved and tested extensively over a period of two years. During
- this time it has been used in a number of experiments of increasing
- scale. The software uses several redundant checks and other
- mechanisms to ensure that careful control is maintained over the
- network operations that are performed [Schwartz & Tsirigotis 1991a].
- In addition, we monitor the progress and network loading of the
- measurements during the measurement runs, observing the log of
- connection requests in progress as well as physical and transport
- level network status (which indicate the amount of concurrent network
- activity in progress). Finally, because the measurements are
- controlled from a single centralized location, it is quite easy to
- stop the measurements at any time.
-
-
-
-
-
-
- Schwartz [Page 4]
-
- RFC 1273 A Measurement Study November 1991
-
-
- Network Appropriate Use and Privacy Issues
-
- When we performed our initial test runs of this study, we attempted
- to inform site administrators at each study site about this study, by
- posting a message on the USENET newsgroup "alt.security" and by
- sending individual electronic mail messages to site administrators.
- We also informed the Computer Emergency Response Team (CERT) at CMU
- of the study. As a practical matter, informing all sites turned out
- to be quite difficult. Part of the problem was that no channels
- exist to allow such information to be easily disseminated.
- Approximately half of the messages we sent to site administrators
- were returned by remote mail systems as undeliverable. Moreover, the
- network traffic and remote site administrative load caused by the
- study announcement messages far outstripped the network and
- administrative load required by the study itself. Some sites felt
- that the announcement was an unnecessary imposition of their time.
-
- In addition to these practical problems, a broad announcement of this
- study could affect the measurements it attempts to gather. Some
- sites would likely react to the announcement by changing the
- reachability of their services. Asking for explicit permission from
- sites would yield even worse methodological problems, as this would
- have provided a self-selected study group consisting of sites that
- are less likely to disconnect from the Internet.
-
- In contrast with our attempts to announce the study, running the
- study without announcing it caused only a small number of site
- administrators to notice the traffic and inquire about it to either
- the CERT or to one of the responsible network contacts at the
- University of Colorado. The remote site administrator and network
- overhead of announcing the the study, coupled with the practical and
- methodological problems of announcing the study, lead us to prefer to
- run the study without further broad announcements. Yet, to avoid
- causing alarm at a site detecting our network measurement activity,
- it makes sense to announce the study.
-
- To resolve this problem, we discussed the study with the Internet
- Activities Board, Internet Engineering Steering Group, National
- Science Foundation, representatives of several U.S. regional
- networks, and a number of individuals involved with network security,
- including the Computer Emergency Response Team, members of the
- Internet Engineering Task Force Security and Advisory Group, and a
- member of the Lawrence Livermore National Laboratory Computer
- Incident Advisory Capability. The first part of our efforts resulted
- in the production of Internet Request For Comments (RFC) number 1262
- [Cerf 1991]. Beyond this, we have agreed that the appropriate action
- at this point is to announce the study well ahead of running it via
- the current RFC, augmented with an electronic posting that briefly
-
-
-
- Schwartz [Page 5]
-
- RFC 1273 A Measurement Study November 1991
-
-
- describes the study goals and methodology and points to this RFC.
- That announcement will be posted to the Internet Engineering Task
- Force mailing list, the comp.protocols.tcp-ip USENET bulletin board,
- and the Computer Emergency Response Team's cert-tools mailing list.
- Moreover, in case a site misses these announcements, we will run the
- measurement software in a fashion intended to minimize the effort a
- site administrator might expend to determine the nature of the
- activity after detecting it. In particular, we will run the program
- from an account called "testnet" on a machine with few other users
- logged in. "Fingering" [Zimmerman 1990] this machine will indicate
- the testnet login. "Fingering" the testnet login will return
- information about this study.
-
- The data collected by this study is somewhat sensitive to privacy and
- security concerns, in the sense that it might be used as a "road map"
- of accessible network services. We will treat the raw data as
- private information, publishing measurements only in global
- statistical terms, divorced from the actual sites that make up the
- underlying data points. We previously carried out a study with much
- larger privacy implications than the current study [Schwartz & Wood
- 1991], and successfully masked the data to protect individual
- privacy.
-
- For Further Information
-
- Information about the general research program within which this
- study fit is available by anonymous FTP from latour.cs.colorado.edu,
- in pub/RD.Papers. This directory contains a "README" file that
- describes the overall research project (which focuses on resource
- discovery), and includes a bibliography. Particularly relevant are:
-
- o [Schwartz 1991b], a project overview;
-
- o [Schwartz 1991a], about an earlier, simpler version of the
- current study;
-
- o [Schwartz & Tsirigotis 1991b], about the netfind white pages
- tool;
-
- o [Schwartz & Tsirigotis 1991a], which considers a number of
- the techniques used in this experiment, including those for
- controlling the progress of the measurements;
-
- and
-
- o [Schwartz & Wood 1991], about an earlier study we carried out
- that raises significant potential privacy questions, for
- which we carefully masked the underlying data, presenting the
-
-
-
- Schwartz [Page 6]
-
- RFC 1273 A Measurement Study November 1991
-
-
- results without sacrificing individual privacy.
-
- Also:
-
- o [Cerf 1991], IAB guidelines for Internet measurement
- activity.
-
- Once the results of this study are complete, we will publish them in
- a conference or journal, as well as by anonymous FTP.
-
- Communication With Principal Investigator
-
- If you would like to have your site removed from this study, or you
- would like to be added to the list of people who receive results from
- this study, or you would like to communicate with the Principal
- Investigator for some other reason, please send electronic mail to
- schwartz@cs.colorado.edu.
-
- References
-
- [Cerf 1991]
- Cerf, V., Editor, "Guidelines for Internet Measurement
- Activities", RFC 1262, IAB, October 1991.
-
- [Schwartz & Tsirigotis 1991a]
- Schwartz M., and P. Tsirigotis, "Techniques for
- Supporting Wide Area Distributed Applications", Technical
- Report CU-CS-519-91, Department of Computer Science,
- University of Colorado, Boulder, Colorado, February 1991;
- Revised August 1991. Submitted for publication.
-
- [Schwartz & Tsirigotis 1991b]
- Schwartz M., and P. Tsirigotis "Experience with a
- Semantically Cognizant Internet White Pages Directory
- Tool", Journal of Internetworking: Research and Experience,
- 2(1), pp. 23-50, March 1991.
-
- [Schwartz 1991a]
- Schwartz, M., "The Great Disconnection?", Technical Report
- CU-CS-521-91, Department of Computer Science, University of
- Colorado, Boulder, Colorado, February 1991.
-
- [Schwartz & Wood 1991]
- Schwartz M., and D. Wood, "A Measurement Study of
- Organizational Properties in the Global Electronic Mail
- Community", Technical Report CU-CS- 482-90, Department of
- Computer Science, University of Colorado, Boulder, Colorado,
- August 1990; Revised July 1991. Submitted for publication.
-
-
-
- Schwartz [Page 7]
-
- RFC 1273 A Measurement Study November 1991
-
-
- [Schwartz 1991b]
- Schwartz, M., "Resource Discovery in the Global Internet",
- Technical Report CU-CS-555-91, Department of Computer
- Science, University of Colorado, Boulder, Colorado,
- November 1991. Submitted for publication.
-
- [Zimmerman 1990]
- Zimmerman, D., "The Finger User Information Protocol",
- RFC 1194, Center for Discrete Mathematics and Theoretical
- Computer Science, November 1990.
-
- Security Considerations
-
- Security issues are discussed in the "Network Appropriate Use and
- Privacy Issues" section.
-
- Author's Address
-
- Michael F. Schwartz
- Department of Computer Science
- Campus Box 430
- University of Colorado
- Boulder, Colorado 80309-0430
-
- Phone: (303) 492-3902
-
- EMail: schwartz@cs.colorado.edu
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Schwartz [Page 8]
-